Learn R Programming

pmclust (version 0.2-1)

EM-like algorithms: EM-like Steps for GBD

Description

The EM-like algorithm for model-based clustering of finite mixture Gaussian models with unstructured dispersions.

Usage

em.step(PARAM.org)
  aecm.step(PARAM.org)
  apecm.step(PARAM.org)
  apecma.step(PARAM.org)
  kmeans.step(PARAM.org)

Arguments

PARAM.org

an original set of parameters generated by set.global.

Value

A convergent results will be returned the other list variable containing all new parameters which represent the components of models. See the help page of PARAM or PARAM.org for details.

Details

A global variable called X.spmd should exist in the .pmclustEnv environment, usually the working environment. The X.spmd is the data matrix to be clustered, and this matrix has a dimension N.spmd by p.

A PARAM.org will be a local variable inside all EM-linke functions em.step, aecm.step, apecm.step, apecma.step, and kmeans.step, This variable is a list containing all parameters related to models. This function also updates in the parameters by the EM-like algorithms, and return the convergent results. The details of list elements are initially generated by set.global.

References

Programming with Big Data in R Website: https://pbdr.org/

Chen, W.-C. and Maitra, R. (2011) “Model-based clustering of regression time series data via APECM -- an AECM algorithm sung to an even faster beat”, Statistical Analysis and Data Mining, 4, 567-578.

Chen, W.-C., Ostrouchov, G., Pugmire, D., Prabhat, M., and Wehner, M. (2013) “A Parallel EM Algorithm for Model-Based Clustering with Application to Explore Large Spatio-Temporal Data”, Technometrics, (revision).

Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society Series B, 39, 1-38.

Lloyd., S. P. (1982) “Least squares quantization in PCM”, IEEE Transactions on Information Theory, 28, 129-137.

Meng, X.-L. and Van Dyk, D. (1997) “The EM Algorithm.an Old Folk-song Sung to a Fast New Tune”, Journal of the Royal Statistical Society Series B, 59, 511-567.

See Also

set.global, mb.print.

Examples

Run this code
# NOT RUN {
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)
comm.set.seed(123)

### Generate an example data.
N.allspmds <- rep(5000, comm.size())
N.spmd <- 5000
N.K.spmd <- c(2000, 3000)
N <- 5000 * comm.size()
p <- 2
K <- 2
data.spmd <- generate.basic(N.allspmds, N.spmd, N.K.spmd, N, p, K)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")

### Quit.
finalize()
# }

Run the code above in your browser using DataLab